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The etiology of inflammatory bowel diseases is only partially explained by the current genetic risk map. It is hypothesized 
that environmental factors modulate the epigenetic landscape and thus contribute to disease susceptibility, manifestation, 
and progression. To test this, we analyzed DNA methylation (DNAm), a fundamental mechanism of epigenetic long-term 
modulation of gene expression. We report a three-layer epigenome-wide association study (EWAS) using intestinal bi- 
opsies from 10 monozygotic twin pairs (n = 20 individuals) discordant for manifestation of ulcerative colitis (UC). Genome- 
wide expression scans were generated using Affymetrix UG 133 Plus 2.0 arrays (layer 1). Genome-wide DNAm scans were 
carried out using lllumina 27k Infinium Bead Arrays to identify methylation variable positions (MVPs, layer 2), and 
MeDIP-chip on Nimblegen custom 385k Tiling Arrays to identify differentially methylated regions (DMRs, layer 3]. 
Identified MVPs and DMRs were validated in two independent patient populations by quantitative real-time PCR and 
bisulfite-pyrosequencing [n = 185). The EWAS identified 61 disease-associated loci harboring differential DNAm in cis of 
a differentially expressed transcript. All constitute novel candidate risk loci for UC not previously identified by GWAS. 
Among them are several that have been functionally implicated in inflammatory processes, e.g., complement factor CFI, 
the serine protease inhibitor SPINK4, and the adhesion molecule THYI (also known as CD90). Our study design excludes 
nondisease inflammation as a cause of the identified changes in DNAm. This study represents the first replicated EWAS of 
UC integrated with transcriptional signatures in the affected tissue and demonstrates the power of EWAS to uncover 
unexplained disease risk and molecular events of disease manifestation. 

[Supplemental material is available for this article.] 



Ulcerative colitis (UC) represents one major subphenotype (OMIM 
191390) of human inflammatory bowel disease (IBD, OMIM 
266600) and is characterized by chronic inflammation of the in- 
testinal mucosa, exhibiting a continuous pattern in the affected 
tissue. In the past decades, the disease displayed a remarkably steep 
rise in incidence, which cannot be explained by genetic variants 
exclusively. Current estimations attribute —16% of the disease 
heritability to identified genetic variants (Anderson et al. 2011), 
while environmental changes interacting with genetic pre- 
disposition (Hampe et al. 1999; Stoll et al. 2004; Franke et al. 2008, 
2010) are discussed as major determinants for disease manifesta- 
tion (Rosenstiel et al. 2009). This is in concordance with observa- 
tions for other complex diseases (Manolio et al. 2009). However, 
neither the relapsing/remitting nature of the inflammatory pro- 
cess nor the delayed onset after several decades of health is un- 
derstood. To address this gap, several studies have focused on the 
functional genomics of UC in order to obtain a high-resolution 
transcriptional picture of disease processes in the inflamed mucosa 
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(Dieckgraefe et al. 2000; Lawrance et al. 2001; Costello et al. 2005). 
Beyond germline DNA variants, epigenetic variants, e.g., DNA 
methylation (DNAm) and histone modifications, could modulate 
disease-relevant gene function (Petronis 2010). Epigenetic patterns 
are highly tissue-specific (Rakyan et al. 2008) and can be influenced 
by environmental factors (Mann et al. 2004; Heijmans et al. 2008). 
Recent technological advances have assessed the degree of methyla- 
tion of specific CpG sites as regulators of disease-associated transcripts 
(van Overveld et al. 2003; Petronis 2010). Consequently, epigenetic 
modifications represent promising candidates for elucidating pro- 
cesses of disease manifestation beyond the identified risk loci. A 
genome-wide study assessing the role of DNAm in the primary dis- 
eased tissue of a chronic inflammatory disorder, such as UC, could 
therefore lead to signatures that capture the environmental in- 
fluence on patients. A particularly powerful approach for addressing 
epigenetic mechanisms is the study of discordant monozygotic (MZ) 
twins (Bell and Spector 2011). Except for highly penetrant Mende- 
lian disorders, MZ twins exhibit a broad range (25%-95%) of disease 
discordance for most complex diseases that cannot be explained by 
classical genetics (Nance 1977). A recent report showed a discor- 
dance rate of 84% in MZ twins for UC (Spehlmann et al. 2008). 
While, historically, twin-based studies have been employed for the 
identification of disease genes, recent interest in MZ twin studies has 
moved to mapping epigenetic changes in identical genomes that 
have precipitated into different phenotypes (Fraga et al. 2005; 
Kaminsky et al. 2009; Javierre et al. 2010; Rakyan et al. 2011a). 
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Starting with a collection of mucosal biopsies from 20 MZ 
twins, discordant for UC, we aimed to generate a genome-wide 
functional epigenetic map independent of the genetic background 
by combining three essential layers of information: (1) Genome- 
wide mRNA expression profiling; (2) genome-wide quantification 
of methylation variable positions (MVPs) representing individual 
methylation events within the proximal promoter regions of 
transcription start sites; and (3) genome-wide assessment of dif- 
ferentially methylated regions (DMRs) representing group effects 
of linked MVPs. Combining these three layers and validation of 
selected findings in an independent cohort of unrelated in- 
dividuals illustrates the potential impact of functional epigenetics 
on disease mechanisms in UC. 

Results 

Three-layer genome-wide scans for differential gene 
expression, MVPs, and DMRs in twins, discordant for UC 

The transcriptome analysis (layer #1) of the mucosa of 20 MZ 
twins, discordant for UC, identified 18,097 out of 54,675 tran- 
scripts analyzed as expressed. Of these, 361 were significantly 
differentially expressed (Benjamini-Hochberg-corrected P-value < 
0.05; FDR [false discovery rate] < 5%) and 356 were in close 
proximity (50 kb) of at least one MVP or DMR. The MVP analysis 
(layer #2) involved interrogation of 27,578 CpG sites, of which 
23,085 were informative, resulting in the identification of 703 MVPs 
between healthy and diseased individuals (Benjamini-Hochberg- 
corrected P-value £ 0.05). The DMR analysis (layer #3) involved the 
examination of 392,750 positions, of which 389,359 were in- 
formative in the mucosal tissue sample, resulting in the identifi- 
cation of 345 DMRs between healthy and diseased individuals 
(Benjamini-Hochberg-corrected P-value < 0.05). 

A genome-wide functional epigenetic map for UC 

Integration of the three layers illustrates that epigenetic differences 
as well as their potential transcriptional consequences can be 
monitored in complex primary tissue on a genome-wide scale (Fig. 
1; high resolution maps of individual chromosomes, see Supple- 
mental Fig. S1A-X). The observed DNAm follows the expected 
bimodal distribution (Supplemental Fig. S2). Hypomethylation is 
prominent in promoter regions (Supplemental Fig. S3A), while 
hypermethylation is more frequent in gene-introns (Supplemental 
Fig. S3B). Similar patterns of variability were observed for all three 
layers, as documented by the intra-class correlations (Supple- 
mental Fig. S4). The principal component analysis identified dis- 
ease as the strongest component. Notably, mRNA and DNAm did 
not contribute equally to the separation between healthy and 
diseased individuals (Supplemental Fig. S5). Collectively, the three- 
layer analysis identified 61 disease-associated genes that were de- 
fined by differential expression and at least one MVP or DMR 
within a ris-interaction window of 50 kb (Supplemental Table SI) 
from the transcription start site. Disease-associated transcripts 
were defined by their significantly differential mRNA expression 
when comparing healthy individuals to UC patients (Benjamini- 
Hochberg-corrected P-value < 0.05; false discovery rate £ 5%). 
Unsupervised clustering of these 61 mRNA/DNAm pairs revealed 
patient similarities of mRNA and DNAm levels (Supplemental Fig. 
S6). Correlation analysis of quantitative expression values and 
MVP or DMR data revealed a correlation in selected candidate 
transcripts (median Spearman-p r = 0.58), a correlation in identified 



Y -. 



B 




genomic alignment [bp x 10 8 ] expression 

m up 

down 



Figure 1 . (A) Genome-wide profiles of DNAm and gene expression. The 
x-axis represents the genomic location with chromosomes represented as 
rows. The /-axis represents the significance of the differences between UC 
patients and healthy individuals and is displayed as -log(p) for the mRNA 
and as log(p) for the epigenetic modifications. Effect size (induction/fold 
change) is encoded by colors. Methylation: (black) up-regulated, (blue) 
not regulated, (purple) down-regulated; mRNA-expression: (red) up- 
regulated, (yellow) no regulation, (green) down-regulated. (B) Higher 
resolution map for selected candidate locus HKDC1 (display and color 
coding according to Fig. 1A; the dotted line indicates the significance 
threshold). A higher resolution of this map is provided in Supplemental 
Figure S1A-X. 



candidate loci (r = 0.49), no correlation in the group of disease- 
associated transcripts (r = 0.15), and no correlation when analyzing 
all transcripts (r = 0.16). The correlation within the selected can- 
didates was significantly different from all other gene-sets (Sup- 
plemental Fig. S7). A large number of differentially expressed 
transcripts (independent of MVPs and DMRs) were associated with 
inflammation, immune response, and other disease-relevant pro- 
cesses when performing a gene ontology (GO) analysis. Similarly, 
differentially expressed transcripts in close proximity to MVPs 
and/or DMRs were also found to be associated with inflammation 
and immune response (Supplemental Fig. S8). The variant patterns 
observed in the validation panel displayed high robustness, as 
documented by the frequencies of their respective occurrences: A 
random individual from the validation panel has a median chance 
of over 80% of exhibiting the variant pattern described for the 
10 candidate loci (Supplemental Fig. S9). 

A direct comparison of the results for mRNA, MVP, and DMR 
levels to 47 previously published UC-associated loci (Anderson 
et al. 2011) revealed that none of those loci exhibited significant 
alterations for any of the three layers investigated (Supplemental 
Table S2). 
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Validation of selected candidate transcripts under potential 
epigenetic control 

Of the 61 disease- and MVP/DMR-associated transcripts, 10 
candidates with the closest proximity between MVP/DMRs and 
transcript location were subjected to further validation and 
replication in a larger collection of primary tissue from the in- 
testinal mucosa (Table 1, validation panel I and II). This was 
performed by pyrosequencing (bisulfite-modified DNA, n = 50 
individuals) for methylation analysis and TaqMan-based real- 
time PCR (n = 135 individuals) for differential mRNA expression. 
Significance criteria were defined in concordance with the initial 
scan (Benjamini-Hochberg-corrected P-value £ 0.05) (details on 
the panel of patients and controls are listed in Table 1; for de- 
tailed description of the methods, see Supplemental Material). 
The results are summarized in Table 2. We selected distinct pat- 
terns of differential DNAm/expression pairs as paradigms for 
different scenarios of epigenetic modulation while focusing 
on the commonly accepted pattern of hypermethylation with 
transcriptional repression and hypomethylation to be associated 
with increased mRNA expression. (1) Pairs with negative corre- 
lation of DNAm and mRNA expression, represented by 40 MVPs 
and five transcripts: increased DNAm accompanied by decreased 
mRNA expression (MT1H) and decreased DNAm with increased 
mRNA expression (CFI, HKDC1, SPINK4, THY1); (2) pairs with 
positive correlation of DNAm and mRNA expression, represented 
by 21 MVPs and three transcripts: increased DNAm accompanied 
by increased mRNA expression (TK1), and decreased DNAm 
with decreased mRNA expression (FLNA, PTN). Control loci were 
chosen to replicate MVPs/DMRs without a c/s-modulation of 
mRNA expression (SLC7A7, 4 MVPs) or differential mRNA 
expression without multiple MVPs or DMRs (IGHG1, 1 MVP). We 
found MVPs and DMRs to be associated with alterations in ex- 
pression levels occurring both at the start of a CpG island (HKDC1, 
TK1) as well as at the end of a CpG island (all other candidates), 
generally referred to as CpG island shores (Doi et al. 2009; Irizarry 
et al. 2009). Results for one selected example, representing a 
validated disease-associated transcript, is shown in Figure 2 
(HKDC1). A substratification analysis performed on validation 
panel I and II revealed no influence of the potential confounding 
factors such as age, gender, biopsy location, and medication on the 
main findings (Supplemental Fig. S10, mRNA; Supplemental Fig. 
Sll, DNAm). 

Combining the results of the three-layer scan and validation 
showed 148-fold more MVPs in proximity to the 10 validated loci 
than expected by chance, corresponding to a P-value of 9.11 x 
10~ 44 (Fisher's exact test). 



Discussion 

In ulcerative colitis, the contribution of genetic variants to the 
disease risk is estimated at —16% (Anderson et al. 2011). Together 
with other factors (Hampe et al. 1999; Stoll et al. 2004; Franke et al. 
2008, 2010), epigenetic modifications may explain parts of the 
remaining disease risk. While disease-relevant epigenetic modifi- 
cations as well as their interaction with environmental factors has 
been shown for other diseases (van Overveld et al. 2003; Mann 
et al. 2004; Heijmans et al. 2008; Rakyan et al. 2008, 2011a; 
Petronis 2010), such effects have not been shown in ulcerative 
colitis. Therefore, this study aimed to create a functional epigenetic 
map by combining genome-wide data of differentially methylated 
regions, methylation variable positions, and transcriptome data, 
reflecting potential effects of the epigenetic variations. A group of 
20 monozygotic twins, discordant for UC, was selected as an entry 
level to perform a three-layer genome-wide scan to target mecha- 
nisms that show significant differences between healthy and dis- 
eased individuals. The validity of such an approach has been re- 
cently demonstrated in psoriasis and lung adenocarcinoma 
(Gervin et al. 2012; Selamat et al. 2012). Obviously, such ap- 
proaches cannot demonstrate the causality of epigenetic modifi- 
cations for altered mRNA levels. In addition, the features measured 
by the systems employed restrict the number of detectable in- 
teractions between DNAm and mRNA expression. Keeping these 
limitations in mind, the presented identification of 61 ds-links 
between epigenetic modifications and disease-associated tran- 
scripts supports the hypothesis that pathophysiological events are 
a reflection of — and potentially controlled by — epigenetic modi- 
fications with consequences on transcriptional changes. The 
identified ris-links show both negative and positive correlations 
between DNAm and RNA expression, which is in concordance 
with two recent studies applying similar approaches (Gervin et al. 
2012; Selamat et al. 2012), yet the nature of positive correlations 
remains unexplained by the current understanding of epigenetic 
regulation. The majority of identified loci are associated with bi- 
ological processes which are either directly or indirectly linked to 
immune processes, which is consistent with previous findings on 
functional genomics of UC (Dieckgraefe et al. 2000; Lawrance et al. 
2001; Costello et al. 2005), as well as with findings on UC genetics 
(Anderson et al. 2011). Interestingly, a smaller proportion of bi- 
ological processes identified are distinct from bona fide immune 
processes: Structural, developmental, and metabolic processes are 
prominent findings. This suggests additional levels of transcrip- 
tional control and is concordant with a previous study showing 
that immune response-associated biological processes are not pri- 
marily under genetic control in mucosal tissue (Hasler et al. 2009). 



Table 1. Study panels used for genome-wide assessment of differential expression and DNAm as well as for validation of initial findings 






Relationship of 


Gender 


Age: Median, 


Disease 


Panel type 


Application 


individuals 


representation 


range 


representation 


Screening panel 


Genome-wide screening of the transcriptome 


Discordant twins (UC, HN) 


10 f 


25, 18-70 


10 UC 




and methylome 




10 m 




10 NC 


Validation panel 1 


Validation of the findings in the transcriptome 


Unrelated individuals 


69 f 


41, 18-76 


30 UC/; 30 UCni 








66 m 




1 5 DC/; 30 DCni 












30 NC 


Validation panel II 


Validation of the findings in the methylome 


Unrelated individuals 


25 f 


41, 18-68 


20 UC 


(e panel 1) 






25 m 




10 DC 












20 NC 



(f) Female, (m) male, (UC) ulcerative colitis, (NC) normal controls, (DC) disease specificity controls; suffix: (/') inflamed, (ni) not inflamed. 
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Table 2. Validation of differential DNAm linked to disease-associated transcripts by TaqMan real-time PCR (mRNA expression in n = 135 
individuals, validation panel I) and pyrosequencing (CpG methylation in n = 50 individuals, validation panel II) 



Prominent gene CpG-position CpG methylation mRNA regulation 

ontology Cyto-genetic relative P-value fold-change; 

Gene symbol; name terms (selected) band to transcript (observations per gene) P-value 



CFI; complement factor 1 


Complement activation; 


4q25 


Post-TSS, 


I 1 .51 


X 


10 


05 


(2) 


T 5.71; 


09 




innate immune 




SGB 












4.45 X 10" 




response 












-02 








FLNA; Filamin A, alpha 


Positive regulation of I-kB 


Xq28 


Post-TSS, 


I 1 .63 


X 


10 


(2) 


1 -2.85; 


06 




kinase/N F-kB cascade 




SGB 












7.72 X 10" 


HKDC1; hexokinase domain 


Glycolysis 


10q22.1 


Pre-TSS and 


I 7.94 


X 


10 




(6) 


T 2.01; 


06 


containing 1 






post-TSS 








01 




5.16 X 10" 


IGHG1"; immunoglobulin heavy 


Immune response; antigen 


14q32.33 


Post-TSS, 


(-) 1 -56 


X 


10 


(0) 


T 8.25; 


09 


constant gamma 1 (G1 m marker) 


binding 




SGB 








02 




8.90 X 10" 


MT1H; metallothionein 1H 


Metal ion binding; protein 


16q13 


Post-TSS 


T 1.50 


X 


10 


(2) 


4 -1.66; 


05 




binding 










-02 




3.04 X 10" 


PTN; pleiotrophin 


Positive regulation of cell 


7q33 


Post-TSS 


4 2.42 


X 


10 


(D 


1 -4.16; 


08 


5/.C7/47 b ; solute carrier family 7 


proliferation 












06 




2.76 X 10" 


Blood coagulation, leukocyte 


14q11.2 


Post-TSS 


I 5.21 


X 


10 


(4) 


(-)1.16; 


01 


(cationic amino acid transporter, 


migration 
















6.73 X 10" 


y+ system), member 7 














06 








SPINK4; serine peptidase inhibitor, 


Serine-type endopeptidase 


9p13.3 


Post-TSS, 


4 3.03 


X 


10 


(2) 


T 13.05; 


09 


Kazal type 4 


inhibitor activity 




SGB 








03 




4.45 X 10" 


THY1; Thy-1 cell surface antigen 


Positive regulation of t cell 


11q23.3 


Post-TSS 


I 2.28 


X 


10 


(2) 


T 2.99; 


07 




activation 
















4.75 X 10" 


TK1; thymidine kinase 1, soluble 


DNA replication 


17q23.2- 


Pre-TSS 


T3.08 


X 


10 


03 


(2) 


T 1-32; 


02 






q25.3 














2.30 X 10" 



(TSS) Transcription start site; (SGB) spanning to gene body; in case of multiple CpGs, all relative positions are described, (f) Up-regulation; (|) down- 
regulation; (-) no regulation as measured in the validation panel. P-values presented were corrected for multiple testing using the Benjamini-Hochberg 
correction. In case of multiple observations, the lowest P-values were presented. 
a Negative control candidate for differential expression without epigenetic modification. 
b Negative control candidate for epigenetic modification without differential mRNA expression. 



The design used here allowed direct comparison of indi- 
vidual MVPs and DMRs. While DMRs may be expected to have 
greater functional relevance than MVPs, this has not yet been 
formally demonstrated by any study. Consequently, both should 
be part of an epigenetic map to ensure the inclusion of all po- 
tentially relevant information. Based on the assessment of 23,085 
CpG sites in our study, no single MVP was found that could serve as 
representative for an entire DMR. In addition, highly significant 
MVPs were found in loci not showing any differential gene expres- 
sion. Together with our observation that a variable number of MVPs 
(in the case of our validated MVP/DMR-transcript pairs: 1-6) may 
potentially influence mRNA expression, these data provide no evi- 
dence that individual MVPs are of higher relevance than DMRs or 
vice versa. 

Previous studies have addressed the epigenetic modulation in 
the context of IBD mostly from a candidate gene approach, fo- 
cused on IBD-associated colorectal cancer (Moriyama et al. 2007; 
Dhir et al. 2008; Edwards et al. 2009; Gonsky et al. 201 1). Together 
with the high tissue specificity (Rakyan et al. 2008) and the fun- 
damental differences in detection, these studies are only of limited 
use for comparison to our systematic genome-wide approach. 
However, our results support one recent report stating that epige- 
netic dysregulation of the IRF 5 promoter is not associated with IBD 
(Balasa et al. 2010). In contrast to that, many of our findings on the 
mRNA level are in concordance with previously published studies 
(Dieckgraefe et al. 2000; Lawrance et al. 2001; Anderson et al. 
201 1), potentially attributed to the lower technical and/or biological 
variance in inflammation-associated mRNA patterns. Among the 
replicated findings, several genes have been directly associated 
with chronic intestinal inflammation; PTN (pleiotrophin) has 
been shown to be functionally linked to inflammation and cancer 



(Kadomatsu 2005), while THY1 (Thy-1 cell surface antigen) mediates 
cell adhesion during inflammation (Jurisic et al. 2010). The serine 
protease inhibitor SPINK4 has been shown to be differentially 
expressed in chronic autoimmune intestinal inflammation (celiac 
disease), likely derived from altered goblet cell activity, while no 
causative genetic variant was identified (Wapenaar et al. 2007). 

The ontogenetic stability of DNAm over time cannot be 
assessed with our study design; however, our findings in the en- 
larged validation and replication panel of unrelated patients and 
controls document the relevance of our results for a more heteroge- 
neous population. It is important to note that we examined the 
disease specificity of our results by including inflammatory disease 
controls (non-IBD-associated inflammation in the sigmoid mucosa) 
in our validation panels. None of the validated candidate genes 
showed DNAm variation together with altered transcript levels 
when comparing diseased controls to healthy individuals. A signif- 
icantly altered DNAm between inflammatory disease controls and 
healthy individuals was only found for the SLC7A7 locus; however, 
this was not accompanied by altered mRNA levels. These results 
corroborate the potential disease specificity of the identified effects 
and further support the concept of a distinct impairment of mucosal 
homeostasis in UC that is not common to intestinal inflammation 
in general. 

Interestingly, all of the identified disease-associated alter- 
ations on mRNA, MVP, or DMR levels are novel as different to the 
47 previously published UC risk loci identified by GWAS (Anderson 
et al. 2011). None of the UC risk loci are considered to be meth- 
ylation quantitative trait loci (Zhang et al. 2010). Furthermore, none 
of the risk loci are located in a regulatory region potentially affecting 
mRNA expression. In addition, the use of monozygotic twins for 
the discovery phase favors identification of candidate variants that 



Genome Research 2133 

www.genome.org 



Ha'sler et al. 



70649800] 70650000] 70650200] 



Exon 12 Exon 13 
70686800] 70687000] 70687200] 



CpG#1 23 4 




B 90—1 



80- 



8 

e 



70- 



60-1 




TaqMan Assay Hs00228405_m1 



■B 1.4-, 



06-1 



CpG# 1 



E 



□ 
□ 



50 



-I - 

60 



"I - 

70 




normal control 
disease control 
UC patient 



—r 

80 



90 



relative methylation [%] 



Figure 2. Validation of DNAm and gene expression in selected candidate transcripts; example: 
HKDC1 . (A) Relative position of CpGs (arrows up, continuously numbered 1-8) and real-time PCR probe 
(dark gray). (B) Quantitative results of the validation: (1) methylation (via pyrosequencing, CpGs 
continuously numbered 1 -8 in concordance with Fig. 2A) in n = 50 individuals (validation panel II); the 
order of assays displayed corresponds to the order in Figure 2A; (2) mRNA(via real-time PCR, right) in n = 
1 35 individuals (validation panel I). (C) Correlation between mRNA expression (x-axis) and relative 
methylation (y-axis) in HKDC1 for a selected CpG; the dotted line represents the correlation trend 
(Spearman-p r = -7.43). Significant differences between UC patients and normal controls are indicated 
by asterisks ([*] P<5x 1CT 3 ; [**] P<5x 1(T 4 ; [***] P<5x 1(T 6 ). 



are independent of genetic effects. This also reflects fundamental 
differences between GWAS and EWAS approaches, as outlined re- 
cently (Rakyan et al. 201 lb): EWAS, especially when conducted in 
monozygotic discordant twins and linked to transcriptome anal- 
ysis, represents a powerful complementary tool to detect variations 
which cannot be captured by a GWAS. 

Finally, the potential clinical relevance of our findings is 
supported by the high frequencies at which these effects occur: In 
93% of all individuals, decreased methylation at these candidate 
loci is reflected by up-regulation of the corresponding transcript 
(or vice versa) when comparing UC patients to healthy controls. 
Moreover, we observed —150-fold more MVPs in the proximity to 
the validated candidate loci than expected by chance. This also 
indicates a high robustness of ris-links between epigenetic modi- 
fications and regulation of disease-associated transcripts. 

In conclusion, our results indicate that changes in DNAm and 
their consequences on the transcriptome may represent disease 
mechanisms for UC, independent of genetic variation. The use of 
primary tissue from monozygotic disease-discordant twins followed 
by the validation of the findings in a larger and independent panel 



of UC patients supports the potential 
disease relevance of our observations. In 
addition, the disease specificity of the 
observed events is strengthened by our 
study design, which controlled for con- 
founding non-disease-associated varia- 
tion, e.g., due to inflammation. While our 
data are suggestive of a link between 
DNAm and its effects on mRNA expres- 
sion, it remains a challenge to formally 
establish causality. Nevertheless, the in- 
tegrated three-layer functional map re- 
ported here will contribute toward a bet- 
ter understanding of IBD pathophysiology, 
further closing the gap between un- 
explained disease susceptibility and dis- 
ease manifestation. 

Methods 



Patient recruitment and patient 
characteristics 

The twenty monozygotic twins, discor- 
dant for ulcerative colitis (screening panel; 
median age: 25, range 18-70) recruited for 
this study were tested for mono/dizygosity 
as previously published (Barbaro and 
Cormaci 2004; von Wurmb-Schwark et al. 
2005). The panel for real-time PCR vali- 
dation of transcript levels consisted of 
135 unrelated individuals (validation 
panel I; n = 30 ulcerative colitis, inflamed; 
n = 30 ulcerative colitis, noninflamed; n = 
30 healthy individuals; n = 15 disease 
controls, inflamed; n = 30 disease con- 
trols, noninflamed; median age: 41, age 
range 18-76), a subgroup of which was 
used for pyrosequencing validation of 
methylation levels (validation panel II; 
n = 20 ulcerative colitis, inflamed; n = 20 
healthy individuals, n = 10 disease con- 
trols, inflamed; median age: 41, age 
range 18-68). Healthy individuals included in the study were 
undergoing colonic cancer surveillance with no previous un- 
specific changes in stool habits, where endoscopic and histological 
examination yielded no significant pathological findings. Ulcera- 
tive colitis patients were selected to display an endoscopically ac- 
tive disease in the sigmoid colon at the time of sampling. In- 
flammation was assessed macroscopically during colonoscopy and 
categorized into (1) no signs of inflammation, (2) low inflam- 
mation, and (3) moderate/high inflammation, while only tissue 
with no or moderate/high inflammation was included in the study 
(see Supplemental Table S3). More than 2000 patients were 
screened to recruit the study population. Disease-specificity con- 
trols included individuals with infectious diarrhea, other forms of 
gastrointestinal inflammation or irritable bowel syndrome. The 
study setup was approved by the Bioethical Committee of the 
University of Kiel, where the patients were recruited. All patients 
gave written informed consent before data and biomaterials were 
collected. Patient characteristics are summarized in Table 1; a more 
detailed description of patient characteristics is presented in Sup- 
plemental Table S3 (A: screening panel; B: validation panel I; C: 
validation panel II). 
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Biopsy processing and sample preparation 

All biopsies used in this study are primary tissue from the intestinal 
mucosa. Biopsies were taken endoscopically from a defined area of 
the colon and immediately snap-frozen in liquid nitrogen. Total 
RNA was extracted and processed as previously described (Mah 
et al. 2004) and quality controlled using an Agilent Bioanalyzer 
according to the manufacturer's guidelines. DNA was extracted from 
biopsies using the QIAamp Tissue DNA preparation kit (Qiagen). 

Expression profiling (mRNA, layer I) 

Total RNA was prepared and hybridized to an Affymetrix UG 133 
Plus 2.0 according to the manufacturer's protocol. Data was nor- 
malized using GCRMA (R, Bioconductor), and signals that were not 
present in at least 80% of the samples (cutoff: detection P-value < 
0.05) were excluded from further analysis. The experimental and 
analytical part of the microarray analysis was performed following 
the MIAME standards, including the data submission to Gene 
Expression Omnibus. (GEO, URL: http://www.ncbi.nlm.nih.gov/ 
geo, series: GSE22619; samples: GSM560961-GSM560976). Dif- 
ferentially expressed genes were determined using the Mann- 
Whitney [/-test, multiple testing correction was performed using 
the Benjamini-Hochberg method (Benjamini and Hochberg 1995), 
and a false discovery rate for the signed fold changes (which were 
based on the ratios of the medians of each group compared) was 
estimated based on a Westfall and Young permutation, using K = 
5000 permutations (Westfall and Young 1993). Criteria for tran- 
scripts to be categorized as differentially expressed were set to: (1) 
corrected P-value < 0.05, and (2) FDR < 5%. 

Analysis of methylation variable positions (MVPs, layer II) 

DNA was prepared and analyzed using HumanMethylation27 
BeadChips (iScan system, Illumina) as previously described (Van 
der Auwera et al. 2010), while the data set was subject to intra-array 
and inter-array normalization as published earlier (Teschendorff 
et al. 2010). Differences between ulcerative colitis patients, dis- 
eased controls, and healthy individuals were determined using the 
Mann- Whitney (7-test, while P-values were corrected according to 
Benjamini and Hochberg (Benjamini and Hochberg 1995). MVPs 
with a corrected P-value < 0.05 were considered significantly dif- 
ferentially methylated. 

Analysis of differentially methylated regions (DMRs, layer III) 

DNA was prepared and hybridized to a custom tiling array 
(Nimblegen, custom 385k array) as previously described (Rakyan 
et al. 2008). The array was designed to cover known autoimmune/ 
inflammatory-linked loci as well as specific genes with immune- 
regulatory function and encompassed all known promoters and 
CpG islands (both promoter- and non-promoter-CpG islands). 
Data was normalized applying the inter-quantile normalization 
using Spotfire for functional genomics (TIBCO). Differences be- 
tween ulcerative colitis patients, diseased controls, and healthy 
individuals were determined using the Mann-Whitney (7-test, 
while P-values were corrected according to Benjamini and Hochberg 
(Benjamini and Hochberg 1995). DMRs with a corrected P-value £ 
0.05 were considered significantly differentially methylated. 

Integration of three layers of genome-wide scans, 
candidate selection 

To identify disease associated transcripts under potential epige- 
netic control, differentially expressed transcripts from layer I were 



selected (corrected P £ 0.05, FDR == 5%, regulated between 
healthy and diseased individuals). The genomic transcript lo- 
cations were used to generate interaction windows of 50 kb 
upstream of and downstream from the transcription start site 
(TSS). These windows were examined to see whether they con- 
tain either a DMR (layer II, corrected P < 0.05, between healthy 
and diseased individuals) or a MVP (layer III, corrected P < 0.05, 
between healthy and diseased individuals). Transcripts signifi- 
cantly regulated between healthy and diseased individuals, with 
a significantly regulated DMR or MVP within the 50-kb win- 
dow of the TSS, were considered candidates for disease-associ- 
ated transcripts under potential epigenetic control. Correlating 
quantitative expression values with DMRs and MVPs within this 
window was carried out using the Spearman-p correlation. Dif- 
ferences between sets of correlation (all genes, disease-associated 
transcripts, and validated transcripts) were assessed using the 
Mann-Whitney (7-test. 

Validation of differential mRNA expression via real-time PCR 

Real-time PCR (TaqMan) was performed according to the man- 
ufacturer's guidelines (Applied Biosystems) using a 7900HT 
Real-Time PCR System. Expression levels were calculated rela- 
tive to beta-actin using the standard-curve method (Livak and 
Schmittgen 2001). Differences between ulcerative colitis pa- 
tients, diseased controls, and healthy individuals were de- 
termined using the Mann-Whitney (7-test, while P-values were 
corrected according to Benjamini and Hochberg (Benjamini 
and Hochberg 1995). 

Validation of differential DNAm via pyrosequencing 

Validation of initial findings was performed via bisulfite conver- 
sion followed by pyrosequencing (Roche, 454) as previously de- 
scribed (Bollati et al. 2007). Differences between ulcerative colitis 
patients, diseased controls, and healthy individuals were de- 
termined using the Mann-Whitney (7-test, while P-values were 
corrected according to Benjamini and Hochberg (Benjamini and 
Hochberg 1995). 

Gene ontology analysis 

Gene ontology analysis was performed as previously published 
(Tavazoie et al. 1999). Biological processes associated with the 
transcripts and candidate genes were retrieved from the Gene 
Ontology Consortium (www.geneontology.org). 

Determination of effect frequencies 

Frequencies of validated effects were determined by assessing the 
number of occurrences, where an effect was following the varia- 
tion pattern observed when comparing the medians of ulcerative 
colitis versus healthy individual signals. An effect was considered 
significant when both differential DNAm and differential mRNA 
expression showed Benjamini-Hochberg-corrected P-values < 0.05 
in the validation experiment. 

Data access 

Genome-wide data sets (three layers) of all individuals included 
in the study have been submitted to the NCBI Gene Expression 
Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) under series 
accession numbers GSE22619 (samples: GSM560961-GSM560976) 
and GSE 27899 (samples GSM688887-GSM688926). 
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